智能论文笔记

Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID

Djebril Mekhazni , Maximilien Dufau , Christian Desrosiers , Marco Pedersoli , Eric Granger

分类：计算机视觉

2022-11-07

Systems for person re-identification (ReID) can achieve a high accuracy when trained on large fully-labeled image datasets. However, the domain shift typically associated with diverse operational capture conditions (e.g., camera viewpoints and lighting) may translate to a significant decline in performance. This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID - a relevant scenario that is less explored in the literature. In this scenario, the ReID model must adapt to a complex target domain defined by a network of diverse video cameras based on tracklet information. State-of-art methods cluster unlabeled target data, yet domain shifts across target cameras (sub-domains) can lead to poor initialization of clustering methods that propagates noise across epochs, thus preventing the ReID model to accurately associate samples of same identity. In this paper, an UDA method is introduced for video person ReID that leverages knowledge on video tracklets, and on the distribution of frames captured over target cameras to improve the performance of CNN backbones trained using pseudo-labels. Our method relies on an adversarial approach, where a camera-discriminator network is introduced to extract discriminant camera-independent representations, facilitating the subsequent clustering. In addition, a weighted contrastive loss is proposed to leverage the confidence of clusters, and mitigate the risk of incorrect identity associations. Experimental results obtained on three challenging video-based person ReID datasets - PRID2011, iLIDS-VID, and MARS - indicate that our proposed method can outperform related state-of-the-art methods. Our code is available at: \url{https://github.com/dmekhazni/CAWCL-ReID}

translated by 谷歌翻译

尽管深度学习架构最近取得了成功，但在现实词应用程序中，人重新识别（REID）仍然是一个具有挑战性的问题。最近，已经提出了几种无监督的单目标域适应性（STDA）方法，以限制源和目标视频数据之间通常发生的域移位引起的REID准确性下降。鉴于人REID数据的多模式性质（由于跨摄像头观点和捕获条件的变化），训练常见的CNN主链来解决跨多个目标域的域移动，可以为实时REID应用程序提供有效的解决方案。尽管在REID文献中尚未广泛解决多目标域的适应性（MTDA），但一种直接的方法包括混合不同的目标数据集，并在混合物上执行STDA以训练公共CNN。但是，这种方法可能导致概括不佳，尤其是在融合越来越多的不同目标域来训练较小的CNN时。为了减轻此问题，我们基于知识蒸馏（KD-REID）引入了一种新的MTDA方法，该方法适用于实时人员REID应用。我们的方法通过从多个专业的教师CNN中蒸馏出来，适应了目标域上常见的轻型学生骨干CNN，每个CNN都适用于特定目标域的数据。对几个具有挑战性的人REID数据集进行的广泛实验表明，我们的方法优于MTDA的最先进方法，包括混合方法，尤其是在训练像OSNET这样的紧凑型CNN骨架时。结果表明，我们的灵活MTDA方法可用于设计成本效益的REID系统，以实时视频监视应用程序。

translated by 谷歌翻译